深度神经网络(DNNS)的广泛应用要求越来越多的关注对其现实世界的鲁棒性,即DNN是否抵抗黑盒对抗性攻击,其中包括基于得分的查询攻击(SQA)是最威胁性的。由于它们的实用性和有效性:攻击者只需要在模型输出上进行数十个查询即可严重伤害受害者网络。针对SQA的防御需要对用户的服务目的而略有但巧妙的输出变化,这些用户与攻击者共享相同的输出信息。在本文中,我们提出了一种称为统一梯度(UNIG)的现实世界防御,以统一不同数据的梯度,以便攻击者只能探究不同样本相似的较弱的攻击方向。由于这种普遍的攻击扰动的验证与投入特定的扰动相比,Unig通过指示攻击者一个扭曲且信息不足的攻击方向来保护现实世界中的DNN。为了增强Unig在现实世界应用中的实际意义,我们将其实现为Hadamard产品模块,该模块具有计算效率且很容易插入任何模型。根据对5个SQA和4个防御基线的广泛实验,Unig显着改善了现实世界的鲁棒性,而不会伤害CIFAR10和Imagenet上的清洁准确性。例如,Unig在2500 Query Square攻击下保持了77.80%精度的CIFAR-10模型,而最先进的对手训练的模型仅在CIFAR10上具有67.34%的速度。同时,Unig在清洁精度和输出的修改程度上大大超过了所有基准。代码将发布。
translated by 谷歌翻译
The score-based query attacks (SQAs) pose practical threats to deep neural networks by crafting adversarial perturbations within dozens of queries, only using the model's output scores. Nonetheless, we note that if the loss trend of the outputs is slightly perturbed, SQAs could be easily misled and thereby become much less effective. Following this idea, we propose a novel defense, namely Adversarial Attack on Attackers (AAA), to confound SQAs towards incorrect attack directions by slightly modifying the output logits. In this way, (1) SQAs are prevented regardless of the model's worst-case robustness; (2) the original model predictions are hardly changed, i.e., no degradation on clean accuracy; (3) the calibration of confidence scores can be improved simultaneously. Extensive experiments are provided to verify the above advantages. For example, by setting $\ell_\infty=8/255$ on CIFAR-10, our proposed AAA helps WideResNet-28 secure 80.59% accuracy under Square attack (2500 queries), while the best prior defense (i.e., adversarial training) only attains 67.44%. Since AAA attacks SQA's general greedy strategy, such advantages of AAA over 8 defenses can be consistently observed on 8 CIFAR-10/ImageNet models under 6 SQAs, using different attack targets, bounds, norms, losses, and strategies. Moreover, AAA calibrates better without hurting the accuracy. Our code is available at https://github.com/Sizhe-Chen/AAA.
translated by 谷歌翻译
单步逆势培训(AT)受到了广泛的关注,因为它被证明是有效和健壮的。然而,存在严重的灾难性过度问题,即反对投影梯度下降(PGD)攻击的强劲准确性突然下降到培训期间的0.5美元。在本文中,我们从优化的新角度来看,首先揭示每个样品和过度装箱的快速增长梯度之间的密切联系,这也可以应用于了解多步骤中的稳健的过度拟合现象。为了控制培训期间梯度的增长,我们提出了一种新的方法,子空间对抗训练(子AT),限制了仔细提取的子空间。它成功地解决了两种过度装备,因此显着提高了鲁棒性。在子空间中,我们还允许单步合并较大的步骤和更大的半径,从而进一步提高了鲁棒性性能。因此,我们实现了最先进的单步性能:我们的纯单步可以达到超过$ \ mathbf {51} \%$鲁棒准确性,反对强大的PGD-50攻击以半径8美元/ CiFar-10上的255美元,甚至超过了标准的多步PGD-10,具有巨大的计算优势。代码已释放$ \脚注{\ url {https://github.com/nblt/sub -at}} $。
translated by 谷歌翻译
Conversational text-to-SQL is designed to translate multi-turn natural language questions into their corresponding SQL queries. Most state-of-the-art conversational text- to-SQL methods are incompatible with generative pre-trained language models (PLMs), such as T5. In this paper, we present a two-stage unified MultI-task Generation frAmework (MIGA) that leverages PLMs' ability to tackle conversational text-to-SQL. In the pre-training stage, MIGA first decomposes the main task into several related sub-tasks and then unifies them into the same sequence-to-sequence (Seq2Seq) paradigm with task-specific natural language prompts to boost the main task from multi-task training. Later in the fine-tuning stage, we propose four SQL perturbations to alleviate the error propagation problem. MIGA tends to achieve state-of-the-art performance on two benchmarks (SparC and CoSQL). We also provide extensive analyses and discussions to shed light on some new perspectives for conversational text-to-SQL.
translated by 谷歌翻译
作为自然语言处理领域(NLP)领域的广泛研究,基于方面的情感分析(ABSA)是预测文本中相对于相应方面所表达的情感的任务。不幸的是,大多数语言缺乏足够的注释资源,因此越来越多的研究人员专注于跨语义方面的情感分析(XABSA)。但是,最近的研究仅集中于跨语性数据对准而不是模型对齐。为此,我们提出了一个新颖的框架CL-XABSA:基于跨语言的情感分析的对比度学习。基于对比度学习,我们在不同的语义空间中关闭具有相同标签的样品之间的距离,从而实现了不同语言的语义空间的收敛。具体而言,我们设计了两种对比策略,即代币嵌入(TL-CTE)和情感水平的对比度学习,对代币嵌入(SL-CTE)的对比度学习,以使源语言和目标语言的语义空间正规化,以使其更加统一。由于我们的框架可以在培训期间以多种语言接收数据集,因此我们的框架不仅可以适应XABSA任务,而且可以针对基于多语言的情感分析(MABSA)进行调整。为了进一步提高模型的性能,我们执行知识蒸馏技术利用未标记的目标语言的数据。在蒸馏XABSA任务中,我们进一步探讨了不同数据(源数据集,翻译数据集和代码切换数据集)的比较有效性。结果表明,所提出的方法在XABSA,蒸馏XABSA和MABSA的三个任务中具有一定的改进。为了获得可重复性,我们的本文代码可在https://github.com/gklmip/cl-xabsa上获得。
translated by 谷歌翻译
A recent study has shown a phenomenon called neural collapse in that the within-class means of features and the classifier weight vectors converge to the vertices of a simplex equiangular tight frame at the terminal phase of training for classification. In this paper, we explore the corresponding structures of the last-layer feature centers and classifiers in semantic segmentation. Based on our empirical and theoretical analysis, we point out that semantic segmentation naturally brings contextual correlation and imbalanced distribution among classes, which breaks the equiangular and maximally separated structure of neural collapse for both feature centers and classifiers. However, such a symmetric structure is beneficial to discrimination for the minor classes. To preserve these advantages, we introduce a regularizer on feature centers to encourage the network to learn features closer to the appealing structure in imbalanced semantic segmentation. Experimental results show that our method can bring significant improvements on both 2D and 3D semantic segmentation benchmarks. Moreover, our method ranks 1st and sets a new record (+6.8% mIoU) on the ScanNet200 test leaderboard. Code will be available at https://github.com/dvlab-research/Imbalanced-Learning.
translated by 谷歌翻译
Weakly-supervised object localization aims to indicate the category as well as the scope of an object in an image given only the image-level labels. Most of the existing works are based on Class Activation Mapping (CAM) and endeavor to enlarge the discriminative area inside the activation map to perceive the whole object, yet ignore the co-occurrence confounder of the object and context (e.g., fish and water), which makes the model inspection hard to distinguish object boundaries. Besides, the use of CAM also brings a dilemma problem that the classification and localization always suffer from a performance gap and can not reach their highest accuracy simultaneously. In this paper, we propose a casual knowledge distillation method, dubbed KD-CI-CAM, to address these two under-explored issues in one go. More specifically, we tackle the co-occurrence context confounder problem via causal intervention (CI), which explores the causalities among image features, contexts, and categories to eliminate the biased object-context entanglement in the class activation maps. Based on the de-biased object feature, we additionally propose a multi-teacher causal distillation framework to balance the absorption of classification knowledge and localization knowledge during model training. Extensive experiments on several benchmarks demonstrate the effectiveness of KD-CI-CAM in learning clear object boundaries from confounding contexts and addressing the dilemma problem between classification and localization performance.
translated by 谷歌翻译
Witnessing the impressive achievements of pre-training techniques on large-scale data in the field of computer vision and natural language processing, we wonder whether this idea could be adapted in a grab-and-go spirit, and mitigate the sample inefficiency problem for visuomotor driving. Given the highly dynamic and variant nature of the input, the visuomotor driving task inherently lacks view and translation invariance, and the visual input contains massive irrelevant information for decision making, resulting in predominant pre-training approaches from general vision less suitable for the autonomous driving task. To this end, we propose PPGeo (Policy Pre-training via Geometric modeling), an intuitive and straightforward fully self-supervised framework curated for the policy pretraining in visuomotor driving. We aim at learning policy representations as a powerful abstraction by modeling 3D geometric scenes on large-scale unlabeled and uncalibrated YouTube driving videos. The proposed PPGeo is performed in two stages to support effective self-supervised training. In the first stage, the geometric modeling framework generates pose and depth predictions simultaneously, with two consecutive frames as input. In the second stage, the visual encoder learns driving policy representation by predicting the future ego-motion and optimizing with the photometric error based on current visual observation only. As such, the pre-trained visual encoder is equipped with rich driving policy related representations and thereby competent for multiple visuomotor driving tasks. Extensive experiments covering a wide span of challenging scenarios have demonstrated the superiority of our proposed approach, where improvements range from 2% to even over 100% with very limited data. Code and models will be available at https://github.com/OpenDriveLab/PPGeo.
translated by 谷歌翻译
In this work, we focus on instance-level open vocabulary segmentation, intending to expand a segmenter for instance-wise novel categories without mask annotations. We investigate a simple yet effective framework with the help of image captions, focusing on exploiting thousands of object nouns in captions to discover instances of novel classes. Rather than adopting pretrained caption models or using massive caption datasets with complex pipelines, we propose an end-to-end solution from two aspects: caption grounding and caption generation. In particular, we devise a joint Caption Grounding and Generation (CGG) framework based on a Mask Transformer baseline. The framework has a novel grounding loss that performs explicit and implicit multi-modal feature alignments. We further design a lightweight caption generation head to allow for additional caption supervision. We find that grounding and generation complement each other, significantly enhancing the segmentation performance for novel categories. We conduct extensive experiments on the COCO dataset with two settings: Open Vocabulary Instance Segmentation (OVIS) and Open Set Panoptic Segmentation (OSPS). The results demonstrate the superiority of our CGG framework over previous OVIS methods, achieving a large improvement of 6.8% mAP on novel classes without extra caption data. Our method also achieves over 15% PQ improvements for novel classes on the OSPS benchmark under various settings.
translated by 谷歌翻译
Nearest-Neighbor (NN) classification has been proven as a simple and effective approach for few-shot learning. The query data can be classified efficiently by finding the nearest support class based on features extracted by pretrained deep models. However, NN-based methods are sensitive to the data distribution and may produce false prediction if the samples in the support set happen to lie around the distribution boundary of different classes. To solve this issue, we present P3DC-Shot, an improved nearest-neighbor based few-shot classification method empowered by prior-driven data calibration. Inspired by the distribution calibration technique which utilizes the distribution or statistics of the base classes to calibrate the data for few-shot tasks, we propose a novel discrete data calibration operation which is more suitable for NN-based few-shot classification. Specifically, we treat the prototypes representing each base class as priors and calibrate each support data based on its similarity to different base prototypes. Then, we perform NN classification using these discretely calibrated support data. Results from extensive experiments on various datasets show our efficient non-learning based method can outperform or at least comparable to SOTA methods which need additional learning steps.
translated by 谷歌翻译